Top-K Document Ranking and Sentence Similarity Retrieval System for COVID-19 Research Papers
نویسندگان
چکیده
منابع مشابه
Optimal Top-k Document Retrieval
Let D be a collection of D documents, which are strings over an alphabet of size σ, of total length n. We describe a data structure that uses linear space and and reports k most relevant documents that contain a query pattern P , which is a string of length p, in time O(p/ log σ n+k), which is optimal in the RAM model in the general case where lgD = Θ(logn), and involves a novel RAM-optimal suf...
متن کاملWeb Document Retrieval Using Sentence-Query Similarity
For the web document retrieval experiments in our TREC '2002 participation, we used two new methods. One is the use of anchor texts, which has been advocated by many researchers. But the methods used by them is different from our method. The second is the use of sentence-query similarity. It has been known that the use of links for web retrieval did not show impressive improvement in performanc...
متن کاملTop-K Color Queries for Document Retrieval
In this paper we describe a new efficient (in fact optimal) data structure for the top-K color problem. Each element of an array A is assigned a color c with priority p(c). For a query range [a, b] and a value K, we have to report K colors with the highest priorities among all colors that occur in A[a..b], sorted in reverse order by their priorities. We show that such queries can be answered in...
متن کاملSentence Ranking for Document Indexing
This article discusses a new document indexing scheme for information retrieval. For a structured (e.g., scientific) document, Pasi et al. proposed varying weights to different sections according to their importance in the document. This concept is extended here to unstructured documents. Each sentence in a document is initially assigned weights (significance in the document) with the help of a...
متن کاملTop-k document retrieval in optimal space
We present an index for top-k most frequent document retrieval whose space is |CSA|+o(n)+D log n D+O(D) bits, and its query time is O(log k log 2+ n) per reported document, where D is the number of documents, n is the sum of lengths of the documents, and |CSA| is the space of the compressed suffix array for the documents. This improves over previous results for this problem, whose space complex...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Social Science Research Network
سال: 2022
ISSN: ['1556-5068']
DOI: https://doi.org/10.2139/ssrn.4193274